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ABSTRACT 

Increasing the automaticity of proofs in deductive verifica- 
tion of C programs is a challenging task. When applied 
to industrial C programs known heuristics to generate sim- 
pler verification conditions are not efficient enough. This 
is mainly due to their size and a high number of irrelevant 
hypotheses. 

This work presents a strategy to reduce program verification 
conditions by selecting their relevant hypotheses. The rele- 
vance of a hypothesis is determined by the combination of a 
syntactic analysis and two graph traversals. The first graph 
is labeled by constants and the second one by the predicates 
in the axioms. The approach is applied on a benchmark 
arising in industrial program verification. 

Categories and Subject Descriptors 

D.2.4 [Software Engineering]: Software/Program Verifi- 
cation 

General Terms 

Verification, Experimentation 

Keywords 

Proof, hypothesis selection 

1. INTRODUCTION 

Deductive software verification aims at verifying program 
properties with the help of theorem provers. It has gained 
more interest with the increased use of software embedded 
in, for instance, airplanes commands, cars or smart cards, 
requiring a high-level of confidence. 

In the Hoare logic framework, program properties are ex- 
pressed by first-order logical assertions on program variables 
(preconditions, postconditions, invariants, . . . ). The deduc- 
tive verification method consists in transforming a program, 
annotated with sufficiently many assertions, into so-called 



verification conditions (VCs) that, when proved, establish 
that the program satisfies its assertions. In the KeY sys- 
tem [2] a special purpose logic and calculus are used to prove 
these verification conditions. The drawback of this approach 
is has it is specific to a programming language and a target 
prover. In contrast, a multi-prover approach is followed by 
effective tools such as ESC/ Java [10] for Java programs an- 
notated using the Java Modeling Language [4], Boogie [1] 
for the C# programming language, and Caduceus/Why [12] 
for C programs. The latter also offers Java as input pro- 
gramming language. 

A theorem prover is invoked to establish the validity of each 
verification condition. One of the challenges in deductive 
software verification is to automatically discharge as many 
verification conditions as possible. A key issue is that the 
whole context of a verification condition is a huge set of 
axioms modelling not only the property and the program 
under verification, but also many features of the program- 
ming language. Simply passing this large context to an auto- 
mated prover induces a combinatorial explosion, preventing 
the prover from terminating in reasonable time. 

Possible solutions to reduce the VC size and complexity are 
to optimize the memory model (e.g. by introducing sepa- 
rations of zones of pointers [16]), to improve the weakest 
precondition calculus [17] and to apply strategies for simpli- 
fying VCs [14, 8, 18]. This work focuses on the latter. We 
suggest heuristics to select axioms to feed automated theo- 
rem provers ( ATPs) . Instead of blindly invoking ATPs with 
a large VC, we present reduction strategies that significantly 
prune their search space. The idea behind these strategies is 
quite natural: an axiom is relevant if a prover applies it suc- 
cessfully, i.e. without diverging, to establish the conclusion. 
Relevance criteria are computed by the combined traver- 
sal of two graphs representing symbol dependencies within 
axioms. In the graph of constants edges represent the con- 
joint presence of two constants in some ground axiom. In 
the graph of predicates arcs represent logical dependencies 
between predicates occuring in the same axiom. 

In former work [5], selection was limited to ground hypothe- 
ses and comparison predicates were not taken into account. 
This led to unsatisfactory results, for instance when the con- 
clusion is some equality between terms. The present work 
extends selection to context axioms, comparison predicates 
and hypotheses with quantifiers. We propose new heuristics 
that increase the number of automatically discharged VCs. 



The plan of the article is as follows. Section 2 presents the 
industrial C example that has motivated this work. This 
case study is a part of the Oslo [3] secure bootloader an- 
notated with a safety property. Section 3 presents the gen- 
eral structure of a verification condition. Section 4 shows 
how dependencies are stored in graphs. The selection strat- 
egy of hypotheses is presented in Section 5. These last two 
sections are the first contribution. The second contribu- 
tion is the implementation of this strategy as a module of 
Caduceus/Why [12]. Section 6 presents experimentation 
results. Section 7 discusses related work, concludes and 
presents future work. 

2. TRUSTED PLATFORM CASE STUDY 

Some new challenges for axiom filtering are posed by the 
context of the PFC project on Trusted Computing (TC). 
PFC (meaning trusted platforms in French) is one of the 
SYSTEM@TIC Paris Region French cluster projects. The 
main idea of the TC approach is to gain some confidence 
about the execution context of a program. This confidence 
is obtained by construction, by using a trusted chain. A 
trusted chain is a chain of executions where each launched 
program is previously registered with a tamperproof compo- 
nent, such as the Trusted Platform Module (TPM) hardware 
chipset. In this context of TC, we focus on the Oslo [3] se- 
cure loader. This program is the first step of a trusted chain. 
It uses some hardware functionalities of recent CPUs (AMD- 
V or Intel- TET technologies) to initialize the chain and to 
launch the first program of the chain. 

The main trusted chain properties are temporal, but some 
recent works [13, 15] propose a method to translate a tem- 
poral property into first-order logic annotations in the code. 
This method is systematic and generates a large amount of 
VCs, including quantifications and arrays with many links 
between them. Therefore, this approach is a good gener- 
ator for VCs with a medium or low level of automaticity. 
Table 1 gives some factual information about the studied 
part of Oslo. The VCs of this benchmark are publicly avail- 
able [24]. 

Oslo program and specification 

Code ~ 1500 lines 

Specification ~ 1500 lines (functional) 
Number of VCs « 7300 VCs 
Observed part of Oslo 

Observed code = 218 lines 

Specification « 1400 lines (functional and generated) 
Number of VCs = 771 VCs 

Table 1: Some Metrics about the Oslo Program 

3. VERIFICATION CONDITIONS 

The verification conditions (VC) we consider are first order 
formulae whose validity implies that a piece of annotated 
source code satisfies some property. This section describes 
the general structure of VCs generated by Caduceus/Why. 
A VC is composed of a context and a goal. This structure is 
illustrated in Fig. 1. 

The context depends on the programming language. It is 
a first-order axiomatization of the language features used 
in the program under verification. Typical features are data 



Goal 



Context =>• Hypotheses =>■ Conclusion 



Figure 1: Structure of verification conditions 



types or a memory model, enriched to allow the specification 
of, e.g. separated pointer regions. For instance, a typical VC 
produced by Caduceus/Why has a context with more than 
80 axioms. 

VCs are generated in the input format of many first-order 
ATPs, among which Simplify [9] and SMT solvers [6]. The 
Simplify automatic prover has a specific input language. 
SMT solvers such as Alt-Ergo and Yices have a common 
input language. Alt-Ergo is however addressed in the Why 
input language for more efficiency. For SMT solvers, the 
context is presented as a base theory, usually a combination 
of equality with uninterpreted function symbols and linear 
arithmetic, extended with a large set of specific axioms. 

The goal depends on the program and on the property under 
verification. When this property is an assertion about a 
given program control point, the goal is generated by the 
weakest precondition (wp) calculus of Dijkstra [11] at that 
control point. The goal is considered as a conclusion implied 
by hypotheses that encode the program execution up to the 
control point. 



Running example. Consider the following function: 



struct p { 
int x; 

} p; 

struct t { 

struct p v[2] ; 
} t; 

/*@ requires \valid(a) && 

@ (\forall int i; 0<=i<=l => \valid(a->v [i] ) ) 

assigns a->v[0].x */ 
void f (struct t *a) { 
a->v[0] .x = 2; 

} 



The requires annotation specifies a precondition and the 
assigns annotation means that function f modifies no other 
location than a->v [0] . x. The hypotheses of the generated 



VC are 

valid(a), 

(Vi : int . < i < 1 => valid(a, shift(acc(m v , a) , i)) A 

valid_acc(m p pM)), 
valid_acc_range(m„, 2), 
separationl_range(m„, 2), 
valid_acc(m„), 
r — acc(m v , a), 
r = shift(r, 0) , and 
m x _0 = upd(m x ,r , 2). 

The conclusion is 

not_assigns(m;c, m x _0, singleton(acc(m v , a))). 

The meaning of these formulae is as follows. m p pM is the 
pointer (P) memory (M) for the structures of type p. valid_- 
acc(m) means that the memory m is initialized, i.e. that 
this memory is accessible from any valid pointer in the al- 
location table. The first two hypotheses correspond to the 
precondition. In the next two hypotheses the predicates 
valid_acc_range(m I1 , 2) and separationl_range(m 1J , 2) respec- 
tively mean that any access to the memory m v returns an ar- 
ray t such that pointers t[0] and t[l] are valid and t[0] ^ t[l]. 
The last three hypotheses come from a flattening-like decom- 
position of the statement a->v [0] . x = 2 performed by the 
VC generator. The function shift(t, i) allows access to the 
index i in the array t. The conclusion translates the assigns 
annotation into a relation between two memory values. m x 
is the value of memory x before execution of f and m x _0 is 
its value after execution of f . The third parameter is the 
representation of a->v[0]. Our preprocessor eliminates the 
last three hypotheses and the intermediary constants that 
they introduce by considering that the conclusion is 

not_assigns(m I , upd(m x , shift(acc(m v , a), 0), 2), , 
singleton(acc(m v ,a))). ^ ' 

4. GRAPH-BASED DEPENDENCY 

Basically, a conclusion is a propositional combination of po- 
tentially quantified predicates over some terms. Dependen- 
cies between axioms and the conclusion can then arise from 
terms and predicates. Terms in the goal may either come 
from the annotated program (from statements or assertions) 
or may result from a weakest precondition calculus applied 
to the program and its assertions. The term dependency just 
transcribes that parts of the goal (in particular, hypotheses 
and conclusion) share common terms. It is presented in 
Section 4.1. Two predicates are dependent if there is a de- 
ductive path leading from one to the other. The predicate 
dependency is presented in Section 4.2. Finally, Section 4.3 
presents a special dependency analysis for comparison pred- 
icates. 

4.1 Term Dependency 

In order to describe how hypotheses connect terms together 
and according to previous work [5] , an undirected connected 
graph G c is constructed by syntactic analysis of term occur- 
rences in each hypothesis of a VC. The graph vertices are la- 
beled with the constants occurring in the goal and with new 
constants resulting from the following flattening-like process. 
A fresh constant f_i where i is some unique integer is cre- 
ated for each term f(ti, . . . , t„) in the goal. There is a graph 
edge between the two vertices labeled with the constants f_i 



and c when c is tj if tj is a constant and when c is the fresh 
constant created for tj if tj is a compound term (1 < j < n). 

Running example. An excerpt of the graph representing 
the VC presented in Section 3 is given in Fig. 2. The vertices 
shift_6 and accj come from the second hypothesis and the 
other vertices come from the conclusion (C) . 




Figure 2: Example of Constant Dependency Graph 

4.2 Predicate Dependency 

A weighted directed graph is constructed to represent im- 
plication relations between predicates in an efficient way. 
Intuitively, each graph vertex represents a predicate name 
and an arc from a vertex p to a vertex q means that p may 
imply q. What follows are details on how to compute this 
graph of predicates, named Gp. This section describes the 
general approach. The next section adds a special treatment 
for comparison predicates. 

First, each context axiom is decomposed into a conjunctive 
normal form (CNF). It is done in a straightforward way (in 
contrast to optimised CNF decomposition [19]): axioms are 
of short size and their transformation into CNF does not 
yield a combinatorial explosion. The resulting clauses are 
called axiom clauses. Each graph vertex is labeled with a 
predicate symbol that appears in at least one literal of the 
context. If a predicate p appears negated (as -ip) in an 
axiom clause, it is represented by a vertex labeled with p. 
A clause is considered as a set of literals. For each axiom 
clause CI and each pair (I, I') £ CI x CI of distinct literals in 
this clause, there is an arc in Gp depending on the polarity 
of I and I' . There are three distinct cases modulo symmetry 
to consider. They are enumerated in Table 2, where p and 
q are two distinct predicates. To reduce the graph size, the 
contraposite of each implication is not represented as an arc 
in the graph but is considered when traversing it, as detailed 
in Section 5.2. 

The intended meaning of an arc weight is that the lower the 
weight is, the higher the probability to establish q from p 
is. Therefore, the arc introduced for the pair (p, q) along 



Value of the (I, I') pair 


Arcs 


(tp, q) 


{p — ► q} 


(p. 9) 


{p — ► q) 


(-"/>, -■«) 


{p — ► q} 



Table 2: Translating Pairs of Literals into Arcs. 




Figure 3: Example of Predicate Dependency Graph 



Table 2 is labeled with the number of predicates minus one 
(card(Cl) — 1) in the clause CI under consideration. For 
instance, a large clause with many negative literals, with 
->p among them, and with many consequents, with q among 
them, is less useful for a deduction step leading to q than the 
smaller clause {^p, q}. Finally, two weighted arcs p — q 

W2 min(w 1 ,w 2 ) 

and p — > q are replaced with the weighted arc p > 

q- 

Running example. Figure 3 represents the dependency 
graph corresponding to the definition of predicates valid, 
not_assigns and valid_acc. It is an excerpt of the graph rep- 
resenting the memory model of Caduceus/Why. 

4.3 Handling Comparison Predicates 

In a former work [5], equalities and inequalities were ig- 
nored when memorizing predicate dependencies. This leads 
to unsatisfactory results when (in)equality is central for de- 
duction, e.g. when the conclusion is some equality between 
terms. If we handle equality as the other predicates, the 
process of Section 4.2 connects too many vertices with the 
vertex labeled =. We have experienced that this reduction 
of the graph diameter has a negative impact on the quality 
of selection. 

More generally the present section suggests a special con- 
struction of graph vertices and edges for comparison predi- 
cates. A comparison predicate is an equality =, an inequal- 
ity 7^, a (reflexive) order relation (< or >) or an irreflexive 
pre-order (> or <). The keys of this construction are the 
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Figure 4: Some Axioms Relating Comparison Pred- 
icates 

support of types and the exploitation of some causalities 
between comparison predicates. 

4.3.1 Typed comparisons 

Each comparison predicate o is written o t where o is =, 7^, 
<, <, > or > an d t is the type of the o operands. For 
simplicity, the focus is on the types t where <t and >t are 
total orders, > t and < t are their respective reverse orders, 
and <t is the union of < t and = t . A typical example is the 
type int of integers. 

Each comparison ti o t ti present in at least one axiom is 
represented by two nodes respectively labeled with o t and 
07, where =t, 7^, <t, <t, >t, and ~ respectively are 
=t, >t, >t, <t, and < t . For instance, the two nodes <i„ t 
and >int represent a total order on integers and its negation. 
These labels are called the typed comparison predicates. 

Apart from this difference in the definition of 07, the arcs 
connected to typed comparison predicates are constructed 
following the general rules described in Table 2. 

4.3.2 Causalities between comparison predicates 
Verification conditions are expressed as SMT problems in 
AUFLIA logics [22]. Since the comparison predicates be- 
tween integers are interpreted in AUFLIA, no context ax- 
iom contributes to their definition. Figure 4 suggests such a 
list of axioms. To lighten the figure, the predicates are not 
indexed with int. 

Adding these axioms to the context would be counterpro- 
ductive. We propose instead to analyze them to enrich the 
predicate graph as if they were in the context. Since the 
algorithm of axiom selection does not take loops into ac- 
count, the sole arcs of interest in the predicate graph are 
between distinct nodes. It is then impossible to proceed so 
on internal properties like reflexivity, irreflexivity, symmetry 
or transitivity. This is the reason why Figure 4 is limited 
to axioms between distinct predicates. The symmetric ax- 
ioms where < and < respectively replace > and > are also 
treated but are not reproduced. The arcs resulting from the 
application of the rules of Table 2 to those ten axioms are 
added to the graph of predicates. 

5. AXIOM SELECTION 

Relevant axioms remain to be selected. Intuitively, an axiom 
is relevant with respect to a conclusion if a proof that needs 
this axiom can be found. Variables and predicates included 
in a relevant axiom are also called relevant. 



Section 5.1 shows how to select relevant constants in, Sec- 
tion 5.2 how to select relevant predicates and Section 5.3 
how to combine these results to select relevant axioms. A 
selection strategy is presented as an algorithm in Section 5.4. 

5.1 Relevant Constants 

A node in the graph of constants G c is identified with its 
labeling constant. Let n be the diameter of the graph of 
constants G c . Starting from the set Co of constants in the 
conclusion, a breadth- first search algorithm computes the 
sets Ci of constants in G c that are reachable from Co with at 
most i steps (1 < i < n). Finally, unreachable constants are 
added to the limit of the sequence (Cn) ngN for completeness. 
Let Coo be the resulting set. 

To introduce more granularity in the computation of reach- 
able constants, we propose as a heuristic to insert nodes that 
are linked several times before nodes that are just linked 
once. Semantically it gives priority to constants which are 
closer to the conclusion. Notice that, in this case, the index 
i of Ci does not correspond to a path length anymore. 

Running example. The sequence of reachable constants 
sets associated to the graph in Fig. 2 is: 

Co = {m x ,m v ,a}, 

Ci = Co U {acc_3, acc_5, acc_7}, 

C2 = Ci U {singletorL.4, shift_2}, 

C 3 = C 2 U {shiflJB}, 

C 4 = C 3 U {upd_l} and 

Coo 64- 

5.2 Relevant Predicates 

A predicate p is identified with the vertex labeled p and its 
negation with the vertex labeled p in the graph of predicates 
Gp. A predicate symbol p is relevant w.r.t. a predicate sym- 
bol q if there is a path from p to q in Gp, or dually from q 
to p. Intuitively, the weaker the path weight is, the higher 
the probability of p to establish q is. Relevant predicates 
extracted from Gp are stored into an increasing sequence 
(L„)„a of sets. The natural number n is the maximal 
weight of paths considered in the graph of predicates. 

We now present how £„ is computed. The conclusion is 
assumed to be a single clause. £0 gathers the predicates 
from the conclusion. For each predicate symbol p that is 
not in £ , a graph traversal computes the paths with the 
minimal weight w from p to some predicate in £ () . 

Furthermore, contraposition of each implication is consid- 
ered: let pi and P2 be two node labels, corresponding either 
to a positive or a negative literal. If the arc pi —* P2 is 
taken into account, its couterpart P2 — > pT is too, with the 
convention that p is p. Let n be the minimal distance from 
£0 to the deepest reachable predicate. For 1 < i < n, Li is 
the set of vertices of Gp whose distance to £0 is less than 
or equal to i. £00 is the limit U;>o ^ i augmented with the 
vertices from which £0 is not reachable. 

Running example. From the predicate graph of the run- 
ning example, depicted in Fig. 3 without the comparison 
predicates for lack of space, the first five sets of reachable 



predicates are 

£0 = {not_assigns}, 

£1 = £0 U {valid, not_in_pset, =}, 

£2 = £1 U {<int, valicLacc, <i„t}, 

£3 = £2 U {valicLacc, ^ int , > int } and 

£4 = £3 U {=, not_in_pset, valid, <int,^t}- 

5.3 Selection of Relevant Axioms 

In this section, we present the main principles of the axiom 
selection combining predicate and constant selection. A first 
part describes hypothesis selection and a second one extends 
the approach to axioms from the context. 

Let (£„)neN and (Cn) ngN respectively be the sequences of 
relevant predicate and constant sets. Let i be a counter 
which represents the depth of predicate selection. Similarly, 
let j be a counter corresponding to the depth of constant 
selection. 

5.3.1 Hypothesis Selection 

Let CI be a clause from a hypothesis. Let V be the set 
of constants of CI augmented with constants resulting from 
flattening (see Section 4.1). Let P be the set of predicates of 
CI. The clause CI should be selected if it includes constants 
or predicates that are relevant according to the conclusion. 
Different criteria can be used to verify this according to its 
sets P and V. Possible choices are, in increasing order of 
selectivity 

1. the clause includes at least one relevant constant or 
one relevant predicate: 

vne, /0 v Pnt./O 

2. the clause includes more than a threshold t v of rele- 
vant constants or more than a threshold t p of relevant 
predicates: 

card(V n Qj) / card{Qj) > t v V card(P n £ i) / ' card(L i) > t p 

3. all the clause constants and clause predicates are rele- 
vant: 

V C Qj A P C Li 

Our experiments on these criteria have shown that a too 
weak criterion does not accomplish what it is designed for: 
too many clauses are selected for few iterations, making the 
prover quickly diverge. Thus, we only consider the strongest 
criterion (3). 

We have also often observed the case where only a conjunc- 
tive part of a universally quantified hypothesis is relevant. In 
that case, we split the conjunctive hypothesis into its parts 
and the filtering criterion is applied to the resulting predi- 
cates. A particular case is considered if a whole splittable 
hypothesis is relevant according to the criterion. Indeed, we 
then consider the original formula, in order to preserve its 
structure, which can be exploited by provers. 

5. 3. 2 Context Axioms 

Consider now the case of selecting relevant axioms from the 
context. Intuitively, an axiom of the context has to be se- 
lected if one of the predicate relations it defines is relevant 



2. If the formula is declared to be satisfiable, we may have 
omitted some axioms; we are then left to increment 
either i or j, i.e. to enlarge either the set of selected 
predicates or the set of selected constants. 

However, allowing predicates has a more critical im- 
pact than allowing new constants, since constants do 
not appear in context axioms. Therefore we recom- 
mend to first increment j, increasing Cj until even- 
tually Coo, before considering incrementing i. In this 
later case, j resets to 0. 



Parameters : VC, f rover, TO 

// Prover call without VC reduction 

Res := Prover{ VC, TO) 

if Res = timeout then 

imax'= 1 + Min depth giving reachable preds (VC) 
Jmax :=1 + Min depth giving reachable vars (VC) 

i := 0; 
j := 0; 

While Res^unsat A i< i max do 
// Prover call after VC reduction 
Res := Prover(selection{ VC, i, j),TO) 
j ■ j I 1: 
*/ j > jmax then 
i:=i + l; 

I L [ J-= 0; 

. return Res- 



Figure 5: General Algorithm Discharging a VC with 
Axiom Selection 

for one hypothesis, i.e. the corresponding arc is used in the 
computation of Li. Practically, for each arc that is passed 
through while generating Li, we keep all the axioms of the 
context that have generated this arc. 

5.4 Selection Strategy 

The selection strategy experimented in this work is described 
in Fig. 5. The algorithm takes three parameters in input: 

• a VC whose satisfiability has to be checked, 

• a satisfiability solver Prover, and 

• a maximal amount of time TO given by the user to the 
satisfiability solver to discharge the VC. 

The algorithm starts with a first attempt to discharge the 
VC without axiom selection. It stops if this first result is 
unsatisfiable or satisfiable. Notice that in the latter case, re- 
moving axioms cannot modify the result. Otherwise, Prover 
is called following an incremental constant-first selection. 

The two natural numbers i max and j max are depth bounds 
for Li and C-, computed during predicate graph and con- 
stant graph traversals. Since we want to reach L^ and Coo, 
imax and j max are initially computed by the tool as one plus 
the minimal depth to obtain all reachable predicates and 
constants. This is interpreted by the tool as the co depth, 
according to Sec. 5.2 and 5.1 (all predicates and constants 
of the graphs) . 

The selection function implements the selection of axioms 
(from context or hypotheses) according to the strongest cri- 
terion (3). Discharging the resulting reduced VC into a 
prover can yield three outcomes: satisfiable, unsatisfiable 
or timeout. 

1. If the formula is declared to be unsatisfiable, the pro- 
cedure ends. Adding more axioms cannot make the 
problem satisfiable. 



3. If the formula is not discharged in less than a given 
time, after having iteratively incremented i and j, then 
the algorithm terminates. 

6. EXPERIMENTS 

The proposed approach is included in a global context of an- 
notated C program certification. A separation analysis that 
strongly simplifies the verification conditions generated by 
a weakest precondition calculus, and thus greatly helps to 
prove programs with pointers has been proposed by T. Hu- 
bert and C. Marchc [16]. Their approach is supported by 
the Why tool. The pruning heuristics presented here are 
developed as a post-process of this tool. 

Section 6.1 gives some implementation and experimentation 
details. Section 6.2 presents experimental results on an in- 
dustrial case study for trusted computing. This case study 
raises new challenges associated to the certification of C pro- 
grams annotated with a temporal logic formula. Section 6.3 
finally gives results obtained on a public benchmark. 

6.1 Methodology 

All the strategies presented in this work are implemented in 
OCaml as modules in the Why [12] tool in less than 1700 
lines of code. Since these criteria are heuristics, their use 
is optional, and Why has command line arguments which 
allow a user to enable or disable their use. In the current 
version, several others heuristics have been developed, which 
are not considered because their impact on the performance 
of Why seems to be less obvious. In order to use the pre- 
sented algorithms, the arguments to include in the Why call 
are: 

— prune-with-comp — prune-context — prune-coarse-pred-comp 
— prune-vars-f ilter CNF 

The first parameter includes comparison predicates in the 
predicate dependency graph. The second one requires filter- 
ing not only hypotheses but also axioms from the context. 
The third one requires to ignore arc weights. This option 
gives better execution times on the Oslo benchmark. Fi- 
nally, the fourth argument requires for rewriting hypotheses 
into CNF before filtering. 

The whole experiment is done on an Intel T8300@2.4GHz 
with 4Gb of memory, under a x86_64 Ubuntu Linux. 

6.2 Results of Oslo Verification 

First of all, among the 771 generated VCs, 741 are directly 
discharged, without any axiom selection. Next, the ap- 
proach developed in [5] increases the result to 752 VCs. 



Among the remaining unproved VCs, some rely on quanti- 
fied hypotheses and others need comparison predicates that 
are not handled in the previous work [5]. They have moti- 
vated the present extensions, namely CNF reduction, com- 
parison handling and context reduction. Thanks to these 
improvements, 10 more VCs are automatically proved by us- 
ing the algorithm described in Fig. 5 with the three provers 
Simplify, Alt-Ergo 0.8 and Yices 1.0.20 with a timeout TO 
of 10 seconds. 

The i m ax and j ma x limits depend on the VCs. Their observed 
values do not go beyond i max = 6 and j max — 7. These limits 
express the number of versions in which the VCs have been 
cut. If edge weights are considered, then i max grows up to 
imax = 18 and the execution time is twice as long. Figure 6 
sums up these results. 

□ Without Pruning E3 Former work [9] ■ Present Work 
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Figure 6: Result Comparison on Oslo Benchmark 
(771 VCs) 

6.3 Public Why Benchmark 

Our approach is developed in the Why tool, which trans- 
lates Why syntax into the input syntax of several proof assis- 
tants (Coq, HOL 4, HOL Light, Isabelle/HOL, Mizar, PVS) 
and automated theorem provers (Alt-Ergo, CVC3, Simplify, 
Yices, Z3). This section shows some experimental results on 
the Why public benchmark 1 . 

The Why benchmark is a public collection of VCs gener- 
ated by Caduceus or Krakatoa. These tools generate VCs 
respectively from C and Java programs, according to CSL 
and JML specifications. Hence, it partially matches to our 
requirements, since our work is focusing on the verification 
of VCs generated by these tools. The only limitation is 
that our method is focusing on VCs with a large amount of 
hypotheses, in contrast to the ones presented in this bench- 
mark. 

This benchmark is provided in two versions corresponding 
to two different pre-processes. Our results are similar with 
both versions. Alt-Ergo discharges 1260 VCs directly and 
1297 VCs with axiom selection, while axiom selection adds 
3 VCs to the 1310 VCs directly discharged by Simplify. 

7. RELATED WORK AND CONCLUSION 

We have presented a new strategy to select relevant hypothe- 
ses in formulae coming from program verification. To do so, 

: http : //proval . lri . f r/why-benchmarks/ 



we have combined two separate dependency analyses based 
on graph computation and graph traversal. Moreover, we 
have given some heuristics to analyse the graphs with a suf- 
ficient granularity. Finally we have shown the relevance of 
this approach with a benchmark issued from a real industrial 
code. 

Strategies to simplify the prover's task have been widely 
studied since automated provers exist [28], mainly to pro- 
pose more efficient deductive systems [28, 27, 26]. The 
KeY deductive system [2] is an extreme case. It is com- 
posed of a large list of special purpose rules dedicated to 
JML-annotated JavaCard programs. These rules make un- 
necessary an explicit axiomatization of data types, memory 
model, and program execution. Priorities between deduc- 
tion rules help in effective reasoning. Beyond this, choosing 
rules in that framework requires as much effort as choosing 
axioms when targeting general purpose theorem provers. 

The present work can be compared with the set of support 
(sos) selection strategy [28, 20]. This approach starts with 
asking the user to provide an initial sos: it is classically the 
conclusion negation and a subset of hypotheses. It is then 
restricted to only apply inferences with at least one clause 
in the sos, consequences being added next into the sos. Our 
work can also be viewed as an automatic guess of the ini- 
tial sos guided by the formula to prove. In this sense, it is 
close to [18] where initial relevant clauses are selected ac- 
cording to syntactical criteria, i.e. counting matching rates 
between symbols of any clause and symbols of clauses issued 
from the conclusion. By considering syntactical filtering on 
clauses issued from axioms and hypotheses, this latter work 
does not consider the relation between hypotheses, formal- 
ized by axioms of the theory: it provides a reduced forward 
proof. In contrast, by analyzing dependency graphs, we sim- 
ulate natural deduction and are not far from backward proof 
search. By focusing on the predicative part of the verifica- 
tion condition, our objectives are dual to those developed 
in [14]: this work concerns boolean verification conditions 
with any boolean structure whereas we treat predicative for- 
mulae whose symbols are axiomatized in a quantified theory. 
Even in a large set of context axioms, most of the time, each 
verification condition only requires a tiny portion of this con- 
text. In [23, 7] a strategy to select relevant context axioms 
is presented, but it needs a preliminary manual task classi- 
fying axioms. Our predicate graph computation makes this 
axiom classification automatic. Recent advances have been 
made in the direction of semantic selection of axioms [25, 
21]. Briefly speaking, at each iteration, the selection of each 
axiom depends on the fact whether a computed valuation is 
a model of the axiom or not. By comparison, our syntactical 
axiom selection is more efficient, indeed linear in the size of 
the input formula. 

In a near future we plan to apply the strategy to other case 
studies. We also plan to investigate the impact on execution 
time of various strategies discharging the same list of verifi- 
cation conditions. We want to confirm or infirm with other 
benchmarks that weighting predicate dependencies with a 
formula length has no positive impact on automaticity but 
has a significant negative impact on the execution time. We 
also plan to integrate selection strategies in the Why tool or 
in a target automated theorem prover. 
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